AITopics | pre-trained transformer model

Collaborating Authors

pre-trained transformer model

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

5b5618e7d061748267d74478b7c5b1ab-Paper-Conference.pdf

Neural Information Processing SystemsFeb-9-2026, 06:12:58 GMT

arxiv preprint arxiv, quantization, transformer model, (12 more...)

Neural Information Processing Systems

Country:

South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
North America > United States (0.04)

Genre: Research Report (0.95)

Industry: Information Technology (0.93)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Vision (0.69)

Add feedback

Deep Compression of Pre-trained Transformer Models

Neural Information Processing SystemsDec-24-2025, 06:47:16 GMT

Pre-trained transformer models have achieved remarkable success in natural language processing (NLP) and have recently become competitive alternatives to Convolution Neural Networks (CNN) and Recurrent Neural Networks (RNN) in vision and speech tasks, respectively. Due to excellent computational efficiency and scalability, transformer models can be trained on exceedingly large amounts of data; however, model sizes can grow tremendously. As high performance, large-scale, and pre-trained transformer models become available for users to download and fine-tune for customized downstream tasks, the deployment of these models becomes challenging due to the vast amount of operations and large memory footprint. To address this challenge, we introduce methods to deeply compress pre-trained transformer models across three major application domains: NLP, speech, and vision. Specifically, we quantize transformer backbones down to 4-bit and further achieve 50% fine-grained structural sparsity on pre-trained BERT, Wav2vec2.0

deep compression, name change, pre-trained transformer model, (5 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Natural Language (0.96)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.59)

Add feedback

Deep Compression of Pre-trained Transformer Models

Neural Information Processing SystemsAug-15-2025, 02:17:42 GMT

Due to their excellent computational efficiency and scalability, transformer models can be trained on exceedingly large amounts of data at the expense of tremendous growth in model size.

arxiv preprint arxiv, machine learning, natural language, (16 more...)

Neural Information Processing Systems

Country:

South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
North America > United States (0.04)

Genre: Research Report (0.95)

Industry: Information Technology (0.93)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Vision (0.69)

Add feedback

Convolutions are Competitive with Transformers for Encrypted Traffic Classification with Pre-training

Lin, Chungang, Zhang, Weiyao, Zuo, Tianyu, Zha, Chao, Jiang, Yilong, Meng, Ruiqi, Luo, Haitong, Meng, Xuying, Zhang, Yujun

arXiv.org Artificial IntelligenceAug-5-2025

Encrypted traffic classification is vital for modern network management and security. To reduce reliance on handcrafted features and labeled data, recent methods focus on learning generic representations through pre-training on large-scale unlabeled data. However, current pre-trained models face two limitations originating from the adopted Transformer architecture: (1) Limited model efficiency due to the self-attention mechanism with quadratic complexity; (2) Unstable traffic scalability to longer byte sequences, as the explicit positional encodings fail to generalize to input lengths not seen during pre-training. In this paper, we investigate whether convolutions, with linear complexity and implicit positional encoding, are competitive with Transformers in encrypted traffic classification with pre-training. We first conduct a systematic comparison, and observe that convolutions achieve higher efficiency and scalability, with lower classification performance. To address this trade-off, we propose NetConv, a novel pre-trained convolution model for encrypted traffic classification. NetConv employs stacked traffic convolution layers, which enhance the ability to capture localized byte-sequence patterns through window-wise byte scoring and sequence-wise byte gating. We design a continuous byte masking pre-training task to help NetConv learn protocol-specific patterns. Experimental results on four tasks demonstrate that NetConv improves average classification performance by 6.88% and model throughput by 7.41X over existing pre-trained models.

artificial intelligence, deep learning, machine learning, (15 more...)

arXiv.org Artificial Intelligence

2508.02001

Genre: Research Report (1.00)

Industry: Information Technology > Security & Privacy (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Deep Compression of Pre-trained Transformer Models

Neural Information Processing SystemsFeb-5-2025, 00:40:42 GMT

deep compression, downstream task, pre-trained transformer model, (2 more...)

Neural Information Processing Systems

Genre: Play > Prospect > Container > Trap (0.44)

Technology:

Information Technology > Artificial Intelligence > Natural Language (0.99)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.61)

Add feedback

Paper Review: Summarization using Reinforcement Learning From Human Feedback

#artificialintelligenceDec-22-2022, 00:06:26 GMT

OpenAI's ChatGPT is the new cool AI in town and has taken the world by storm. We've all seen countless Twitter threads, medium articles, etc., that highlight the different ways ChatGPT can be used. Some developers have already started to build applications, plugins, services, etc., that leverage ChatGPT. While the exact workings of ChatGPT aren't yet known since OpenAI hasn't released a paper or open-sourced their code yet. We trained this model using Reinforcement Learning from Human Feedback (RLHF), using the same methods as InstructGPT, but with slight differences in the data collection setup.

agent, reinforcement learning, reward model, (16 more...)

#artificialintelligence

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.48)

Add feedback

A Dynamic Web App Using Pre-trained Transformer Models for Sentiment Analysis and Text…

#artificialintelligenceMay-1-2022, 23:30:51 GMT

Transformers are one of the most exciting concepts in Natural Language Processing. This article is a guide on working with pre-trained models that use transformers. A transformer model is a neural…

pre-trained transformer model, sentiment analysis, text summarization, (5 more...)

#artificialintelligence

Industry: Information Technology > Software (0.42)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Information Extraction (0.50)
Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (0.50)

Add feedback

An Ensemble of Pre-trained Transformer Models For Imbalanced Multiclass Malware Classification

Demirkıran, Ferhat, Çayır, Aykut, Ünal, Uğur, Dağ, Hasan

arXiv.org Artificial IntelligenceJan-2-2022

Classification of malware families is crucial for a comprehensive understanding of how they can infect devices, computers, or systems. Thus, malware identification enables security researchers and incident responders to take precautions against malware and accelerate mitigation. API call sequences made by malware are widely utilized features by machine and deep learning models for malware classification as these sequences represent the behavior of malware. However, traditional machine and deep learning models remain incapable of capturing sequence relationships between API calls. On the other hand, the transformer-based models process sequences as a whole and learn relationships between API calls due to multi-head attention mechanisms and positional embeddings. Our experiments demonstrate that the transformer model with one transformer block layer surpassed the widely used base architecture, LSTM. Moreover, BERT or CANINE, pre-trained transformer models, outperformed in classifying highly imbalanced malware families according to evaluation metrics, F1-score, and AUC score. Furthermore, the proposed bagging-based random transformer forest (RTF), an ensemble of BERT or CANINE, has reached the state-of-the-art evaluation scores on three out of four datasets, particularly state-of-the-art F1-score of 0.6149 on one of the commonly used benchmark dataset.

api call sequence, classification, dataset, (13 more...)

arXiv.org Artificial Intelligence

2112.13236

Country:

Europe > Middle East > Republic of Türkiye > Istanbul Province > Istanbul (0.04)
Asia > Middle East > Republic of Türkiye > Istanbul Province > Istanbul (0.04)
Asia > Nepal (0.04)
Asia > China (0.04)

Genre: Research Report > New Finding (1.00)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback